Development of a Bayesian classifier for breast cancer risk stratification: a feasibility study.

BACKGROUND
Lifetime risk assessment tools are relatively limited in identifying breast cancer risk in younger women. The predictive value of mathematical models to estimate risk varies according to age, menopausal status, race/ethnicity, and family history. Current risk prediction models estimate population, not individual, levels of breast cancer risk; hence, individualized risk prediction models are needed to identify younger at-risk women who could benefit from timely risk reduction interventions. Clinical data collected as part of breast cancer screening studies may be modeled using Bayesian classification.


PURPOSE
To train a proof-of-concept Bayesian classifier for breast cancer risk stratification.


PATIENTS AND METHODS
We trained a Bayesian belief network (BBN) model on cohort data (including risk factors, demographic, electrical impedance scanning (EIS), breast imaging, and biopsy data) from a prospective pilot screening trial in younger women (N = 591). Receiver operating characteristic curve analysis and cross-validation of the model were used to derive preliminary guidance on the robustness of this approach and to gain insights into what a cross-validation exercise could provide in terms of risk stratification in a larger population.


RESULTS
Independent predictors of biopsy outcome in the BBN model included personal breast disease history, breast size, EIS (low vs high risk) and imaging results, and Gail cutoff (5-year risk: <1.66% vs > or =1.66%). Area under the receiver operating characteristic curve and positive predictive value for benign and malignant biopsy outcomes were 0.88 and 97% and 0.97 and 42%, respectively. Patient-specific probability of biopsy outcome given positive EIS result and Gail model 5-year risk > or =1.66% indicated that the combined effect of these predictors on likelihood that a biopsy would prove malignant exceeded the sum of the individual effects; breast cancer likelihood is as follows: 3% (EIS negative and Gail model 5-year risk <1.66%) versus 9% (EIS positive and Gail model 5-year risk <1.66%) versus 27% (EIS negative and Gail model 5-year risk > or =1.66%) versus 45% (EIS positive and Gail model 5-year risk > or =1.66%).


CONCLUSION
Clinical data collected as part of breast cancer screening studies can be modeled using Bayesian classification. The BBN model may be predictive and may provide clinically useful incremental risk information for individualized breast cancer risk assessment in younger women.

ePlasty VOLUME 10 respectively. Patient-specific probability of biopsy outcome given positive EIS result and Gail model 5-year risk ≥1.66% indicated that the combined effect of these predictors on likelihood that a biopsy would prove malignant exceeded the sum of the individual effects; breast cancer likelihood is as follows: 3% (EIS negative and Gail model 5-year risk <1.66%) versus 9% (EIS positive and Gail model 5-year risk <1.66%) versus 27% (EIS negative and Gail model 5-year risk ≥1.66%) versus 45% (EIS positive and Gail model 5-year risk ≥1.66%). Conclusion: Clinical data collected as part of breast cancer screening studies can be modeled using Bayesian classification. The BBN model may be predictive and may provide clinically useful incremental risk information for individualized breast cancer risk assessment in younger women.
Breast carcinoma is the most commonly diagnosed cancer and the second leading cause of cancer-related mortality among women in the United States. 1 In 2009, there were more than 192 000 estimated new cases of cancer of the breast and more than 40 000 disease-specific deaths. 1 Breast cancer-related mortality rates have steadily decreased over the past 2 decades, largely because of improved disease detection and therapy. 2 As breast cancer in younger (age < 40 years) women is infrequently diagnosed in the early stages by utilizing current screening guidelines, improved cancer screening and detection methods are important in current research, particularly in younger, at-risk women. 3 Breast cancer in younger women typically has unfavorable prognostic characteristics associated with increased disease-specific mortality; hence, early detection in younger women is imperative. [4][5][6] Younger women are not referred for periodic imaging unless they are identified as being "high risk." 7 "At risk" younger women with significant family history or genetic factors are encouraged to undergo frequent clinical and annual breast imaging surveillance and to consider chemoprevention.
While increased surveillance for at-risk women may be beneficial, the value of this approach is restricted by the rarity of breast cancer due to known genetic risk factors. 8,9 More than 90% of breast cancers are detected in women who are not identified as being high risk. 3 Furthermore, screening mammography is generally less accurate in younger women and those with increased breast tissue density commonly encountered in women younger than 40 years. 10 The reduced sensitivity of mammography for dense breasts impacts age groups in which a "life saved" often results in "higher" personal and societal costs in terms of altered life expectancy and personal productivity. 11 Magnetic resonance imaging (MRI) is being used increasingly as a screening modality in high-risk women with a significant family history of breast cancer or those with BRCA1 or BRCA2 gene mutations, resulting in lifetime risk of cancer exceeding 20%. 12,13 Hence, breast MRI is currently applied to a relatively small proportion of all women. Because MRI is unaffected by breast tissue density, it is appealing to consider its use for screening young women in general; however, the high cost, requirement for intravenous contrast administration, and variable specificity limit its feasibility for widespread population-based screening. 14,15 Therefore, improved methods for risk prediction in younger women are needed to identify those at high risk for breast cancer.
Tamoxifen may be considered in both premenopausal and postmenopausal women, and raloxifene may be considered in postmenopausal women, with lobular carcinoma in situ or with a 5-year breast cancer risk estimate of 1.66% or higher (according to the Gail model or the National Cancer Institute Breast Cancer Risk Assessment Tool), in order to reduce the risk of estrogen receptor-positive breast cancer. 16 In the National Surgical Adjuvant Breast and Bowel Project (NSABP) P-1 study tamoxifen (20 mg/day for 5 years) consistently reduced the incidence of breast cancer by 49% in at-risk women across all study age and risk groups (women age 35-59 with a ≥1.66% risk, those 60 years or older, or with those prior LCIS) thereby demonstrating the efficacy of chemoprevention for this disease. 17 The Multiple Outcomes of Raloxifene Evaluation (MORE), Continuing Outcomes Relevant to Evista (CORE), Raloxifene Use for the Heart (RUTH,) and NSABP Study of Tamoxifen and Raloxifene (STAR) trials demonstrated consistent significant reductions in estrogen receptor-positive breast cancer incidence in at-risk postmenopausal women. 16 Subsequent analyses of the NSABP P-1 study data suggested improved quality-adjusted survival and cost-effectiveness when tamoxifen was initiated as early as age 35 years in at-risk (Gail model 5-year risk ≥1.66%) women. 18,19 Hence, identification of women who are at high risk and may benefit from chemoprevention is of particular importance.
Lifetime relative risk assessment tools (eg, Gail model) are available to identify women older than 35 years who are at risk for breast cancer. However, the predictive value of mathematical models to estimate breast cancer risk varies according to age, menopausal status, race/ethnicity, and family history of breast cancer. Instruments such as the Gail model are imperfect for identifying increased cancer risk in younger women. 20 Importantly, all current risk prediction models estimate population, not individual, levels of breast cancer risk. Currently, the only criterion generally used to identify high-risk young women who could benefit from chemoprevention is family/genetic history. The value of this risk estimation paradigm is limited by the rarity of breast cancer due to known gene mutations. Better individualized risk prediction models are needed to identify younger at-risk women who could benefit from risk reduction interventions and earlier chemoprevention.
Bayesian belief network (BBN) models have been used in research to better understand research data and biologic systems such as functional genomics. In recent years, these applications have been applied to better understand clinical problems, such as models developed to estimate breast cancer risk in mammographic microcalcifications and predict false-positive mammograms. 21,22 We believe that clinical data collected as part of breast cancer screening studies may be modeled using Bayesian classification. The objective of this study was to train a proof-of-concept machine-learned BBN model based on previously unpublished cohort data from this prospective pilot screening trial and to perform crossvalidation for the purposes of evaluating the feasibility of using readily available data (including risk factors, demographic data, breast impedance, and breast imaging) and the Bayesian classification for breast cancer risk stratification (estimating biopsy outcome) in younger women.

Patients
Between August 2002 and March 2005, a total of 591 female military healthcare beneficiaries were enrolled into this institution review board-approved, single-arm, prospective pilot screening trial. The clinical protocol was reviewed and approved by institutional review ePlasty VOLUME 10 boards of the Walter Reed Army Medical Center, Washington, DC, and the Keller Army Hospital, West Point, NY. Subjects who met the eligibility criteria were offered participation in this study. Subjects were recruited from the gynecology clinic or the Comprehensive Breast Center at Walter Reed Army Medical Center or the gynecology or family practice clinic at Keller Army Hospital. Study inclusion criteria consisted of younger women aged 18 to 49 years who provided informed consent prior to study enrollment and who were willing to be followed at the participating institution. Age was stratified for analysis as follows: younger than 30, 30 to 34, 35 to 39, and 40 to 49. Potential study subjects were excluded if they had breast surgery (including core biopsy) or were lactating within the preceding 3 months, had breast fine needle aspiration within the preceding 1 month, were pregnant, had electrically powered implanted devices (eg, pacemaker), or were undergoing chemotherapy or radiation treatment. Data collected for each study subject included age, race/ethnicity, clinical history (personal and family history of breast cancer, previous breast surgery or biopsy, and results of those interventions), hormonal information (age of menarche and first full-term pregnancy, menstrual status, date of last menstrual period, and exogenous hormone use), breast density and size (bra cup size), Gail model risk estimate, results of clinical breast examination (CBE), screening breast electrical impedance scanning (EIS), conventional imaging, and biopsy results. All study participants underwent EIS of the breast by using the T-ScanTM 2000ED (Mirabel Medical, Austin, TX) as previously described. 23

Statistical Methods
The BBN model was trained by using a priori variables to estimate the likely diagnostic outcome of breast biopsy. The BBN model was developed by using commercially available machine-learning algorithms (FasterAnalytics, DecisionQ, Washington, DC), which automatically learn network structures and joint probabilities from the prior probabilities in the data. BBN models are a type of directed acyclic graph, which means that they represent information in a hierarchical format. BBN models allow us to identify those variables that contain the most information and are thus most useful for estimating outcomes. The associations represented by BBN models are associations of conditional dependence, allowing us to estimate the posterior likelihood of a given outcome given prior observations.
In order to refine the model, a stepwise training process was used. Quantitative and qualitative assessments were used to optimize variable preparation and selection in order to produce the most robust and useful model. The objective was to produce the optimum biopsy outcome estimate through iterative quality assurance and reduction of confounding information. This process used to develop the model is summarized as follows: (1) preliminary modeling to identify appropriate machine-learning parameters, data quality issues, and confounding features and feature analogues that reduce model accuracy, (2) global modeling to set appropriate machine-learning parameters, remove identified analogs and confounders, and perform full "queue learning" to observe global data structure, (3) naive modeling of the outcome of interest to identify the relative contribution of covariates, and (4) focused modeling by using "queue learning" on subsets of variables identified in the prior steps to derive a more focused BBN model than that obtained in global modeling. By excluding marginal or noncontributory variables, the remaining ones are explored more exhaustively.
Cross-validation was performed on the final focused Bayesian classifier by using a train-and-test cross-validation methodology to produce classification accuracy estimates. Fivefold cross-validation was performed by randomizing the data set into 5 separate and unique train-and-test sets. Each set consists of a training set composed of 90% of patient records and a test set consisting of the remaining 10% of records. Once the model was constructed with a training set, the matching test set was entered into the model, generating a case-specific prediction for each record for independent variables of interest. A receiver operating characteristic curve was plotted for each test to calculate classification accuracy. The receiver operating characteristic curve was used to calculate area under the curve, a metric of overall model quality, and to calculate corresponding predictive values for biopsy outcome.

RESULTS
The study population comprised an ethnically diverse group of younger women (41% non-Caucasian), healthcare beneficiaries in a free access system of military medical care. Of the 591 study participants, 67% were younger than 40 years (mean age: 35 ± 6.9 years) and 90% were premenopausal (Table 1). Two percent of the study population was taking exogenous hormones at the time of study enrollment; however, there was no statistically significant association with disease (P = .95). Fifty-five percent of participants had no family history of breast cancer, and family history was only marginally associated with biopsy outcome (P = .10). The findings of CBE were statistically associated with both age (P = .01) and disease (P ≤ .001); 31% of subjects had findings that were deemed not suspicious, whereas 4% of subjects had suspicious CBE findings. Five percent of study subjects had an estimated 5-year risk of breast cancer ≥1.66% according to the Gail model, and these findings were statistically associated with both disease and age of subject (P ≤ .001). Mammography was performed in 281 women and was found to be Breast Imaging Reporting and Data System (BIRADS) III or higher in 75 cases (27%); mammography was found to be statistically associated with both disease and age of subject (P ≤ .001). Breast ultrasound examination was performed in 258 women and was found to be BIRADS III or higher in 66 cases (26%); ultrasound was found to be statistically associated with disease (P ≤ .001) but not with age (P = .18).
We also studied other well-known risk factors in our population and identified 3 risk factors that were not statistically associated with biopsy outcome: mean age at menarche (P = .12), mean age at first pregnancy (P = .39), and nulliparity (P = .93). Finally, there was no statistically significant difference between the mean age of our population (35 years) and the mean age at time of cancer diagnosis (38 years, P = .35) or diagnosis of premalignant histopathology (38 years, P = .56). We tabulated data by age group and biopsy outcome as shown in Tables 1 and 2. Of the 591 women enrolled in the study, 568 were found to be EIS negative (low risk) and 23 were found to be EIS positive (high risk). In the EIS-negative group, 95 underwent biopsy and 87 were benign on final histopathology. The remaining 8 were either premalignant (n = 4) or malignant (n = 4). In the EIS-positive group, 10 underwent biopsy; 5 were benign, whereas 5 were either premalignant (n = 3) or malignant (n = 2). Of 13 premalignant or malignant lesions, EIS identified 5 (38.5%). The negative predictive value (NPV) of the EIS-negative group was 92%, whereas the positive predictive value (PPV) of the EIS-positive group was 50%. ePlasty VOLUME 10 We trained a proof-of-concept BBN model on this pilot cohort data and performed analysis and cross-validation. The Bayesian Network shown in Figure 1 indicates that the six nearest independent associated features (direct relationship to breast biopsy diagnosis) used to estimate a breast biopsy diagnosis (Biopsy category) are screening breast EIS result, Gail model cutoff (5-year risk estimate <1.66% vs ≥1.66%), mammogram BIRAD result, MRI BIRAD result, breast size, and personal history of breast disease. This does not mean, however, that "Any Palpable Mass" on CBE and ultrasound BIRAD results (indirect relationship to breast biopsy diagnosis- Fig 1) do not influence the estimate of likely biopsy diagnosis, but rather that they are conditionally independent of biopsy outcome, given knowledge of screening breast EIS and MMG BIRAD results. The BBN feasibility model was validated using train-and-test cross-validation and produced strongly predictive areas under the curve (0.75-0.97) for differentiating malignancy and premalignant disease from benign findings (Table 3). Cross-validation also produces a 97% NPV and a 42% PPV for malignancy. It is important to note that with a relatively small set of outcomes, there is a high degree of variance in results between cross-validation exercises ( Table 3). The BBN model is a recursive information structure, and the inclusion of conditional dependence between predictive variables guards against overinterpretation of data (overfitting). The model informs estimates not only through estimation of biopsy outcome but also through estimation of as-yet-unknown imaging results, wherein estimates of biopsy outcome can be derived from available clinical and imaging data, even if some imaging studies are unavailable at the time of biopsy outcome estimation.
To demonstrate the use of this type of model, we walk through an example case of how we can use available information to estimate clinically relevant outcomes using the network. Knowledge of breast size (bra cup) B (Fig 2: Evidence 1) results in slightly lower risk of cancerous biopsy result (−3.6%) for the subject compared with our reference population. When the additional knowledge of Gail model 5-year high-risk estimate (Fig 3: Evidence 2) is added to refine the posterior estimate of biopsy outcome given previously known breast size (bra cup) B, there is a 12% increased likelihood of cancerous biopsy and a 17% increase in the likelihood of premalignant histology, relative to our overall study cohort. Finally, adding knowledge of a positive (high risk) EIS screening result (Fig 4: Evidence 3) increases the posterior risk estimate of cancerous biopsy by 21% and the risk estimate of premalignant disease by 35%. Each posterior probability estimate is the result of adding evidence for each factor being used to make the estimate. As new evidence is "added" for a given factor, the existing evidence already input remains unchanged. However, other nodes ePlasty VOLUME 10 in the network for which no evidence is available have their posterior probabilities updated given other evidence that has been input, and these new posteriors in turn influence the predicted variable. As all features in the model are, at some level, conditionally dependent with biopsy outcome, those features available at the time of initial clinical visit (a priori knowledge) can be selected and applied to the model to estimate biopsy outcome. Subsets of features can also be used to generate an inference table (Table 4) that can be used to quickly estimate biopsy outcome for all known combinations of the identified features. The incremental values of both screening breast EIS and the Gail model data are shown in the inference table, Table 4. Under the most favorable circumstances (EIS negative and Gail model 5-year risk <1.66%) the risk of malignancy is 3%, and under the least favorable circumstances (screening EIS positive and Gail model 5-year risk ≥1.66%), the risk of malignancy is 45%. ePlasty VOLUME 10

DISCUSSION
For the majority of younger women, namely those considered to be average risk for developing breast cancer under the current risk assessment and screening model, the only generally available risk assessment modality is CBE, which is imperfect as a screening tool as it has an unacceptably low sensitivity and high false-positive rate compared with mammography. 24 CBE detects cancers only when they have advanced to the point of being palpable. When cancers are clinically palpable, they have reached a more advanced stage of disease. Palpable breast cancers typically require more aggressive and costly treatments, with concomitant worse quality of life and oncological outcomes. 25,26  The early detection of breast cancer in younger women is very important, particularly because it demonstrates aggressive tumor biology with rapid tumor growth, demonstrates a relatively short preclinical disease phase, and has worse cancer-specific survival than in older women. 6,27,28 A risk stratification paradigm that improves upon the interpretation of existing clinical information should allow us to detect disease at an earlier stage of development. The Gail model offers an improvement in predicting risk, yet it is still an imperfect tool because it is designed using a primarily older, Caucasian population. Further, as more effective personalized detection, prevention, and treatment strategies become available for breast cancer in younger women, strategies and technologies that support truly personalized risk assessment and screening can favorably impact survival, especially if conducted at shorter intervals than in older women. 28−33 Recognizing the need for individualized breast cancer risk assessment tools for younger women, we conducted a feasibility study to determine whether a machine-learned BBN model could be developed to support individualized breast cancer risk stratification. The model trained and cross-validated in this study was based on data from a prospective pilot screening trial in younger women (N = 591) and produced receiver operating characteristic curves, when cross-validated, with areas under the curve of 0.88, 0.97 and 0.75 for benign, malignant, and premalignant findings, respectively. This proof-of-principle study shows that clinical data collected as part of routine, current breast cancer screening studies can be developed into enhanced screening tools with improved sensitivity and specificity by using machine-learned BBN models. These networks can use readily available information to estimate clinically relevant outcomes, providing clinically useful incremental risk information for individualized breast cancer risk assessment in younger women. In our BBN model, the features showing direct conditional dependence with biopsy outcome include (1) personal history of breast disease, (2) breast size (bra cup). (3) EIS (low vs high risk), (4) breast imaging results, and (5) Gail model risk cutoff (5-year risk <1.66 vs ≥1.66). Each of these variables is also significantly related to biopsy category when examined using the calculated χ 2 . CBE and breast ultrasound results were also determined to be statistically associated with biopsy outcome, and the Bayesian model ePlasty VOLUME 10 includes these features as well, but they are associated with biopsy outcome through EIS and mammography results. Other features that showed statistical significance, but dropped out of the training process in the model, include patient ethnicity, menopausal status, and prior breast biopsy. The machine-learning process is designed to produce a simpler (parsimonious) model whenever possible; thus, these additional factors were likely surpassed by more specific imaging and personal history risk factors.
Interestingly, a number of attributes were found to have statistically significant association with patient age, including CBE findings, mammography BIRADS category, nulliparity, and Gail model 5-year risk score. Conversely, certain factors found to be significantly associated with biopsy outcome, using both bivariate statistical tests and the machine-learning process, were not associated with patient age: breast (bra cup) size, EIS screening examination, or MRI. There was no statistically significant difference in mean age at diagnosis of premalignant or malignant disease compared with the mean age of our study population, and when biopsy results were examined by age category, they still did not demonstrate any statistically significant associations. Finally, features considered as well-established breast cancer risk factors in the general population were not found to have statistically significant associations with biopsy outcome in our younger population, including family history of breast cancer, age at menarche, nulliparity, and age at first pregnancy. While these are considered as common risk factors for developing breast cancer, data suggest that these variables are not effective to determine individual risk of cancerous or premalignant lesions in our study population. Although we must be cautious not to overinterpret these findings, they do raise important questions about the appropriate risk measures in a younger population of ethnically diverse women when well-established risk factors have statistically significant association with subjects' age while our disease outcome appears to be age independent.
Having trained, encoded, and validated the machine-learned BBN model, we can estimate the likely biopsy outcome given readily available clinical and imaging data. However, the BBN model not only allows the posterior estimation of the likely biopsy outcome but also identifies a hierarchy of conditional dependence that allows us to identify which pieces of information are most useful in calculating our estimate. This hierarchy also defines how independent variables influencing biopsy outcome also influence one another, providing a better understanding of how the estimate is derived and providing an opportunity to estimate missing parameters by using those currently available for any given patient. It is notable that the combined effect of these independent predictors on the likelihood of disease is greater than the sum of the individual effects. By way of example, mammography finding of BIRAD IV increases the likelihood of a malignant biopsy result in our study population by 5%, whereas a Gail model 5-year risk score of greater than 1.66% increases the likelihood of malignancy by 26%, yet together these findings increase the likelihood of disease by 42%-greater than the sum of their individual effects.
Importantly, the current proof-of-principle model estimates probability of having a breast abnormality biopsied and having it show the underlying malignancy. The ultimate clinical utility of such a model with appropriate sample size, population disease incidence, and follow-up will be based on robust predictive value of the model for developing breast cancer. The most clinically relevant model will be based on easily obtainable nonimaging parameters to identify at-risk women, who could benefit from breast imaging-based screening and risk reduction interventions.

CONCLUSION
A need exists for a breast cancer risk estimation paradigm that can be used along with relevant demographic, clinical, and other readily obtainable patient-specific data in younger women in order to provide an individualized cancer risk assessment, direct screening efforts that can lead to prophylaxis, and detect breast cancer at an early stage. The computational complexity of designing such risk stratification algorithms for the average-risk woman necessitates a large, multidimensional cohort and requires a selection and encoding methodology that is both robust and transparent so as to sustain clinical scrutiny and improve clinical practice. For our study, we integrated multidimensional clinical, imaging, and pathological data from a prospective cohort to test the feasibility of model development.
Using this cohort, we trained a BBN model by using machine-learning algorithms and developed a risk classification model for the average-risk younger woman, with promising cross-validation results. Our proof-of-concept study shows that this type of model could be used to perform individualized screening on a regular basis, using available clinical data at low cost, and that it warrants further assessment and independent testing.